Towards the full information chain theory: solution methods for optimal 



When additional information sources are available in decision making problems that al- 
low stochastic optimization formulations, an important question is how to optimally use the 
information the sources are capable of providing. A framework that relates information ac- 
curacy determined by the source's knowledge structure to its relevance determined by the 
problem being solved was proposed in a companion paper. There, the problem of optimal 
information acquisition was formulated as that of minimization of the expected loss of the 
solution subject to constraints dictated by the information source knowledge structure and 
depth. Approximate solution methods for this problem are developed making use of proba- 
bility metrics method and its application for scenario reduction in stochastic optimization. 
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INTRODUCTION 



In many practically important decision making problems where uncertainty about input data is 
present and optimization methods are appropriate, sources of additional information are in principle 
available. Often, information that such sources possess fails to be taken advantage of due to its 
perceived and factual imprecision and to the lack of methodology that allows doing this in a regular 
controlled fashion. Such methodology, if developed in a general setting, would form a branch of the 
science of information which is at present represented by the classical Information Theory and its 
extensions. Generalizing somewhat, one can say that Information Theory explores the implications 
of information quantity while abstracting from the information content and, in particular, its 
accuracy and relevance. Respectively, the classical Information Theory is predominantly a theory 
of information transmission and related activities (such as compression). On the other hand, if 
information is to be acquired and used for decision making, a quantitative framework describing 
these processes would be rather helpful. 

A notion of full information chain was introduced in to schematically describe the typical 
path of information from acquisition to usage (see Fig. [T]for an illustration). In this context, the 
classical Information Theory is a theory of the middle link, while the methodology developed in 
1-^ and the present article concerns the basics of a general theory of the two end links. More 
specifically, the information acquisition link was addressed in 

HQ 

, and the basic framework for 

the information use link was proposed in [Q] resulting - after making a connection with the results 
of [l|, ^ - in a formulation of the optimal information acquisition problem. This article builds 
on these results and proposes specific solution methods for the optimal information acquisition 
problem. 

The proposed approach, as was mentioned earlier, can be looked upon as an attempt to initiate 
a process of extending the classical Information Theory to a theory of the whole information chain. 
The field of Information Theory, born from Shannon's work on the theory of communications [4], 
since have enjoyed great success in a number of fields that include, besides communication theory, 



statistical physics 



a 



computer vision 



7|, climatology ^9|, physiology [10] and neurophysiology 



111 ]. Generalized Information Theory (see e.g. 



12 



13 



14( 1) addresses problems of characterizing 



uncerta inty in frameworks that are more general than classical probability such as Dempster-Shafer 



theory 



15| 



The approach developed here is based on a theory of information exchange between the agent 
and information source(s) that is developed in [l|, ^. The latter can be thought of as a development 
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FIG. 1. The full information chain. 



16l4l8l|. This line of work 



m\2M- The 



of the general theory of inquiry that goes back to the work of Cox 
received more attention recently resulting in a formulation of the calculus of inquiry 
definition of questions adapted iri tl|| corresponds to the particular subclass of questions - the 
partition questions - defined in [l^ . It is^also related to the measure-dependent notion of a 
question introduced in 



m 



22I ]. Our work in [1|, ]^ goes beyond that on the calculus of inquiry in 
that it introduces the concept of pseudoenergy as a measure of source specific difficulty of various 
questions to the given information source. One could say that it develops a quantitative theory of 
knowledge as opposed to the theory of information. 

One of successful applications of the order-theoretic approach to fundamental physics is the 
recent derivation of Lorentz transformations and Minkowski metric of special relativity directly 
from the consideration of the partial order of events in space-time. 

Information Physics 2M ^ relatively new branch of physical sciences that studies the role 



information plays in fundamental laws of nature. This line of research goes back to the defining 
work of Jaynes ^, ^ on the application of the Principle of Maximum Entropy (MaxEnt) to derive 
the fundamental laws of thermodynamics. It is related to the proposed framework in that it 
addresses information relevance in application to physical sciences. The main Information Physics 
hypothesis is that the laws of nature are essentially the laws of inductive inference correctly applied 
to respective systems. In order to correctly formulate them one needs to know the degrees of 
freedom and the relevant information necessary to completely specify the system state. Recently, 
this approach (in modified and extended form) was applied to derive the fundamental laws of 



classical [25| and quantum 26[| mechanics, and also - very recently - relativistic quantum theory 
[2^. A closely related line of research explores the ramifications of general order-theoretic relations. 
An interesting example of the latter is the derivation 231] of Lorentz transformations and Minkowski 



metric of special relativity directly from the consideration of the partial order of events in space- 
time. 

The area of statistical decision making has dealt with the idea of improving solution quality by 
means of acquiring additional information. There have been applications to innovation adoption 
28 1 . 29). fashion decisions [sol and vaccine composition decisions for fiu immunization 31 1 can be 
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mentioned in this regard. Some authors 32], 33| even introduced models (e.g. effective information 
model) for accounting for the actual, or effective, amount of information contained in the received 
observations. One could also mention the recent work on optimal decision making in the absence 



34 



361 ]. The difference of the proposed 



of the knowledge of the distribution shape and parameters 
approach is in that it explicitly describes and allows to optimize over not just the quantity of 
additional information but also its content and is based on explicit description of properties of 
information sources. 

The related problem of optimal usage of information obtained from experts has been addressed 
in existing research literature mostly in the form of updating the agent's beliefs given probability 



assessment from multiple experts 



37 



40f] and optimal combining of expert opinions, including 



experts with incoherent and missing outputs 



4ll ]. In the present and preceding papers, the emphasis 



is on optimizing on the particular type of information for the given expert (s) and decision making 
problem. 

Methodologically, the present article borrows from the field of probability metrics and scenario 
reduction in stochastic optimization. More details, along with relevant references, can be found in 
Appendices. 

The rest of the paper is organized as follows. In Section |II1 main results of that are 

necessary for the developments in this paper are reviewed. Section IIIII reviews main results of 
P] where, in particular, the problem of additional information acquisition was formulated in the 
specific form that is used here. Section IIVI develops the main theoretical framework for the use 
of scenario reduction methods for optimization of additional information acquisition. Section |V] 
provides an example illustrating the use of methods developed in Section HVl Section |Vl] contains a 
conclusion. App endix lA] provides proofs for some of the results in the main text. Appendix IB] gives 
some background information on probability metrics in application to stochastic optimization, and 
Appendix [O contains a very brief review of scenario reduction algorithms. 



II. INFORMATION ACCURACY: SOURCE KNOWLEDGE STRUCTURE 



As was explained in [3], the starting point of the whole discussion is a problem of the general 
form 



mma;gx 



/ f{u;,x)P{du;). 
Jn 



(1) 
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where X is the set of all feasible solutions, is a parameter space to which uncertain problem 
parameters belong, and P is a fixed initial probability measure (with a suitable sigma-algebra 
assumed) on Vl that describes the initial state of the uncertainty. The function /: x X — ?> M 
is assumed to be integrable on Vt for each x & X. For example, in the context of stochastic 
optimization, X is the set of feasible first-stage solutions and f{uj,x) is the best possible objective 
value for the first stage decision x in case when the random outcome oj is observed. 
Let L{P) be the expected loss corresponding to measure P defined as follows. 

L(P)= / f{u;,x*p)P{du;)- [ f{uj,xl)P{du), (2) 

where x*p is a solution of ([T]) and x* is a solution of minxex f , -c) for the given oj. 

The main goal is, as explained in jsj], for the given information source(s), to find the way of 
extracting information from it so that the resulting expected loss is minimized. The knowledge 
structure of the source determines the accuracy of source's answers. The difference between the 
original loss ^ and the loss obtained with the help of the source's answers serves as a measure of 
the information relevance. 

The process of information exchange between an agent and an information source was described 
m Here we review the main results to make the presentation self-contained. The source 

is characterized by its knowledge structure encoded in the form the question difficulty functional 
described below. The agent poses questions for which the source provides answers. 

Questions were identified in [l| with partitions C = {Ci, . . . ,Cr} of the parameter space Q of 
the problem. Partitions were allowed to be incomplete, i.e. such that U^^^^Cj C fi. The question 
difficulty functional was introduced to measure the degree of difficulty of the question to the given 
information source, so that the information source would be able to answer questions with lower 
values of the difficulty functional more accurately that those with higher values of difficulty. The 
specific form of the difficulty functional was determined in [l] by demanding that it satisfy a system 
of reasonable postulates that, in particular, imposed the requirements of linearity and isotropy. The 
resulting form of the difficulty functional is given in the following theorem. 

Theorem 1 Let the function G{Q, C, P) where C = {Ci, . . . , Cr} satisfy Postulates 1 through 6 
(see [Ij). Then it has the form 

E;=i^(c,)P(c,)iog^ 



G{n,c,p) 



Jq u(u)) dP(u)) 

where u{Cj) = — ^ p(c ) '^^^ ^ ~^ ^ ^■s integrable nonnegative function on the parameter 

space ^l. 
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One can see that the difficulty of the given question C depends on, besides the initial probability 
measure P, the function u : M+ on the parameter space ri. Using parallels with thermody- 

namics (see [l| for more details), this function may be called the pseudotemperature. The question 
difficulty then can be interpreted as the amount of pseudoenergy associated with question C. 

Given a question C = {Ci, . . . , Cr}, the information source can provide an answer V{C) that 
takes one of values in the set {si, . . . , Sm}- A reception of the value Sk has an effect of modifying 
the original probability measure P on to a new (updated) measure P^. To ensure the the answer 
V{C) is in fact an answer to the (complete) question C (and no more) the following condition is 
required to hold for the updated measures P^, k = 1, . . . ,m: 

r 

P^ = Y^p,^Pc^, (3) 
i=i 

where p^j, k = l,...,m, j = l,...,r are nonnegative coefficients such that Yl^j=iPkj = 1 for 
k = 1, . . . ,m. 

The answer depth functional Y{Q,C, P,V{C)) for the answer V{C) to question C measures 
the amount of pseudoenergy that is conveyed by ^(C) in response to question C. The general 
form of Y{0,, C, P, V{C)) can be established if certain reasonable requirements (postulates) it has 
to satisfy are imposed. A system of postulates proposed in 
question difficulty and, in particular, imposes the requirements of linearity and isotropy. The 
following theorem was then proved in Qj. 



2] that parallels the postulates for 



Theorem 2 The answer depth functional Y{Q,C,P,V{C)) has the form 

" E-=i^(c^i)^'(Ci)iog^ 
Yin,c,p,vic)) = j;pr(y(c) = s,) ' ' ^^^-^ , 

where P^ is the measure on Q updates by the reception ofV{C) = s^ and u{Cj) = p^^j-y f(j. u{oj)dP{oj) 
and the function u: — ?> M is the same function that is used in the question difficulty functional 

G{n,c,p). 

It can be shown (see [9] for details) that if V{C) is any answer to the question C then 
Y{n,C,P,V{C)) < G{n,C,P) with equality if and only if the answer V{C) = V*{C) is per- 
fect, i.e. P^ = Pc^ for j = 1, . . . , r. 

As far as answers that are not perfect are concerned, it is convenient to consider the class of 
answers for which the degree of imperfection is described by a single error probability a - the quasi- 
perfect answers Q]. For a quasi-perfect answer Vq,(C) to a (complete) question C = {Ci, . . . ,Cr}, 
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the coefficients pkj have the form = (1 — a)5kj + aP{Cj) for A; = 1, . . . , r and j = 1, . . . , r, and 
the updated measure is simply 



P^ = aP + (l-a)Pc.. 



for k = 1, . .^r. Clearly, for a = a quasi-perfect answer to C becomes a perfect one. It can be 
shown (see H]) that the answer depth functional for a quasi-perfect answer V^(C) to question C 
can be written as 

Y{n, C, P, K.(C)) = u{Ck)P{Cu){l - a + aP(Cfc)) log ^ ~ "p^g.^^"^ 



fc=i 



+ a log a J] n(C7fe)P(Cfc)(l - P(Cfc)), 



fe=i 



which can be seen to reduce to G(il, C, P) for a = (when V{C) = V*{C)) and vanish for a = 1. 
An information source model provides a connection between questions and the corresponding 

n 

answers. It was defined in j2] as a function h : such that 



y(f^, c, p, y(c)) = /i(G(o, c, p)). 



The simplest information source model considered in 

h{x) = 



X \i X <Ys 
Y, \ix>Y, 



is the simple capacity model given by 

(4) 



which is fully characterized by the single parameter Ys which has the meaning of the information 
source capacity. The most apparent drawback of model (j3]) is that, according to it, the source 
provides a perfect answer to any question with difficulty not exceeding the source capacity. The 
linear modified capacity model described by 



h{x) 



bx if X < ^ 
Ys if X > ^ 



(5) 



removes this drawback at the expense of one extra parameter 6 < 1 that has to be estimated. 
Several other models were proposed in [2]. 

The values of model parameters as well as pseudotemperature functions for information sources 
can be estimated from the observed sources' performance on sample questions. The corresponding 
estimation procedures were also discussed in {^. 
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III. INFORMATION RELEVANCE: LOSS REDUCTION 

The basic framework for the information use hnk description was presented in [s^. We briefly 
summarizes the main points here to make a transition to the subject of this article in a self-contained 
manner. 

Consider the set S of all maps from into X with a discrete image set. Any such map 5 € S can 
be uniquely described by the corresponding partition C = {Ci, . . . , C^} of and the corresponding 
image set / = {xi, . . . ,Xr} such that g{ijj) = xj for all uj G Cj. Let P be any probability measure 
on fi, let X an arbitrary element of the solution space X, and let 5 € S be an arbitrary map from 
$7 into X. The suboptimality, loss and gain functionals are defined (Q) as follows. 

S{x, P) = Epfioj, x) - Epf{u, x*p) = [ {f{uj, x) - f{uj, x*p))P{du), (6) 



L{g,P) = Epf{iv,g{uj)) - Epf{u,x*J = [ {f{iv,g{uj))-f{iv,x*J)P{du;), (7) 

Jn 

and 

B{g, P) = Kpfiu:, x*p) - Epf{co, g{u)) = f {f{u, x*p) - f{co, g{u)))P{duj). (8) 

respectively. 

Moreover, it is convenient to introduce the corresponding functionals not just for a fixed measure 
P, but also for the given question C and a given answer V'(C), For example, for an arbitrary x G X, 
the suboptimality of solution x with respect to question C (and initial measure P) is given by 

s 

S{x,Pc) = Y^P{Cj)S{x,Pc^), (9) 

i=l 

and the suboptimality of x with respect to answer V{C) to question C (and initial measure P) 
reads 

m 

5(x,Pv(c))=E^'^^(^'^')' (10) 

k=l 

where Vk = Pr(y(C) = Sk) for brevity. The loss and gain functionals for the given map 5 G S and 
question C and answer V{C) are defined analogously. 

Note that each map g = {C{g), I{g)) from the set 9 can be characterized by the corresponding 
loss L{g,P) with respect to the original measure P and the value G{Q,C{g), P) - the difficulty 
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of the corresponding question. The efficient frontier in the Euchdean plane with coordinates 
{G{i},C{g), P), L{g, P)) can be found by solving the following parametric optimization problem 

minimize L{g,P) 

subject to G{n, C{g),P) <j 

for all values of the parameter 7. 

The maps g that are solutions of (jlip for various values of the parameter 7 possess the property 
of having the smallest possible loss among the maps corresponding to questions whose difficulty does 
not exceed the given value 7. We denote by the subset of all maps in 9 that are solutions of pT]) 
and by S the set of all subset-optimal maps, i.e. maps of the form {{Ci, . . . , Cr}, {xp^ j • ■ • ; x*p^ }), 
where x*p^ is an optimal solution of problem ([1]) with measure P replaced with the conditional 



measure Pcj ■ Then, as was shown in 



3|, 



^ e, (12) 



i.e. if one is interested in finding Pareto-optimal maps in it is sufficient to consider subset-optimal 
maps only. We call a partition C optimal if the corresponding map g = ({Ci , . . . , C^}, {xp^ , • • • , x*p^}) G 
C belongs to the set of Pareto-optimal maps. So the problem of finding maps in the set is 
equivalent to that of searching for optimal partitions of the parameter space il. 

Let us now address the optimal information acquisition problem (jl4p : what question(s) need to 
be asked the given information source in order to obtain the minimum possible loss for ([1]). Given 
a question C = {Ci, . . . , Cr} to an information source and its answer V{C) taking values in the 
set {si, . . . , Sm}, we denote by £(sfc), k = 1, . . . , m the minimum conditional expected loss given 
that V{C) = Sk and by L{V{C)) the minimum expected loss that the agent can achieve given the 
answer V^(C). The latter can be found as 

m 

L{V{C)) = ^ Pr(y(C) = Sk)L{sk), (13) 

k=l 

i.e. as an expectation over possible values of the answer V^(C). 

If the agent poses a question C = {Ci, . . . ,Cr} to the information source and receives a partic- 
ular value Sfc of answer V'(C), the original measure P onQ, gets updated to P^. Therefore, in order 
to minimize loss for the given value Sk of answer V{C), the agent needs to choose the solution Xpj. 
— the solution minimizing the expectation Epfe/(a;,x) over all (feasible) values of x. 

The next two propositions, proved in [3], give the minimum expected loss achievable with a 
perfect and a general answer to question C, respectively. 
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Proposition 1 Let C = {Ci, . . . ,Cr} be a complete question and gc,p G S 6e a corresponding 
subset optimal map. If the agent is given a perfect answer V*{C) to C then 

LiV*iC)) = L{gc,P,P). 



Proposition 2 Let C = {Ci, . . . ,Cr} be a complete question and gc,P & C be a corresponding 
subset optimal map. If the agent is given a (generally imperfect) answer V{C) to C then 

L{ViC)) = B{gc,P,Pvic)) + Li9c,P,P). 



The information acquisition optimization problem can then be written as 
minimize L(y{C)) 

(14) 

subject to Y{n, C, P, V{C)) = h{G{n, C, P)), 

where the minimum expected loss L(y{C)) is given by either Proposition [1] or Proposition [2j The 
source model function h{-) and the pseudotemperature function «(•) that enters the expressions 
for the question difficulty and answer depth in ()14p are assumed to be known. 

It's easy to see that if a source is capable of providing perfect answers (for instance, in the 
simple linear model) solution of problem ()14p reduces to finding the efficient frontier: if L*{G) is 
the expression describing the efficient frontier (abstracting from its true discrete structure) and 
Yg is the capacity of the information source, then the minimum in (jl4p is equal to L*(Ys) and is 
achieved by the question C lying on the efficient frontier such that G(0, C, P) = Yg. 

If a source cannot provide perfect answers, questions with difficulty exceeding the source ca- 
pacity need to be considered in order to minimize the expected loss. The search for an optimal 
question in this case becomes more complicated as the error structure for the source's answers 
needs to be taken into account. If answers are assumed to be quasi-perfect, optimal question(s) 
can be found approximately provided the efficient frontier is already known. 

IV. INFORMATION ACQUISITION OPTIMIZATION 



As stated in the previous section, a solution of the optimal information acquisition problem 
(jl4p is greatly facilitated by the search for the efficient frontier L*{G) in the set S of all questions. 



11 



To find the latter, one needs to determine optimal partitions of the parameter space Q. It turns 
out that the methods of measure (scenario) reduction developed previously for solving stochastic 
optimization problems can be also helpful for the task of searching for optimal partitions. 



A. Measure reduction and optimal partitions 

In the following, we assume that the (initial) probability measure P is supported at a discrete 
set {oji, . . . ,ujn} = C il: 

N 

P = Y,Pr^^.^ (15) 

i=l 

where 5^] is a Dirac delta that puts a unit mass at uj. Points oji ^ VLn are usually referred to 
as scenarios. The scenario reduction methodology (see Appendix C) is often used in stochastic 
optimization to lower computational complexity of various practically important problems. In 
scenario reduction approach, the original discrete measure P given by (|15|) is said to be reduced to 
another discrete measure Q given by 

M 

Q = E^^'^-.' (16) 

if the support {wi, . . . ,Cjm} of Q is a subset of VL^. 

For later convenience, we denote by "RMi^N) the set of all scenario reduction maps from the 
set of measures of the form (fTSj) supported at Otv into the set of all measures of the form (fT6|) 
supported at some subset of of cardinality M < N satisfying the additional property that we 
call simplicity. A map € 31m (^n) is called simple if there exists a partition {5*1, . . . , Sm} of the 
set of scenarios CIn such that uioJi) = Uj for all uji G Sj and qj = Yl{ruj^<^Sj}P'i-- -^^^ such a case we 
write Q = v{P) and Sj = u^^{ujj) for j = 1, . . . , M. 

Additionally, if c: x 17 — )■ M+ is some symmetric cost function, we call a map v G 3?j\/(il7v) c- 



optimal if i = argminc(tJj, v{i^i)) for i = 1, . . . , A^. It is shown in [42] that the Monge-Kantorovich 
functional (see Appendix B) jlc{P-, Q) is minimized for all measures Q supported at {ui, . . . jOJm} = 
z^(iljv) iff the corresponding simple scenario reduction map is c-optimal. 

In the following we call measures P and Q C-equivalent for some partition C of if P{C) = 
Q{C) for all C € C. It is easy to see that measures P and Q are C-equivalent for all possible 
partitions C if and only if P = Q, but two distinct measures can easily be C-equivalent for a 
specific partition C. In particular, any two measures on are C-equivalent if C is the trivial 
partition C = {^}. 



12 



Given a probability measure P on and some measure Q that was obtained from P by a 
reduction, let us denote by Q{Q\P) the virtual pseudoenergy content of measure Q relative to P. 
It is defined as follows. 

Q(Q|P) = G{^,Cf{P),P) - G{n,Cf{Q),Q), (17) 

i.e. Q{Q\P) is the difference between the difficulties of exhaustive questions associated with mea- 
sures P and Q, respectively. One can think about the virtual pseudoenergy of Q relative to P as 
an amount of pseudoenergy a source would need to supply in order to obtain a new state in which 
the hardest possible question has a difficulty equal to G(il, Cf{Q), Q). Since no question is in fact 
answered in going from measure P to the reduced measure Q, we call this pseudoenergy virtual. 

We can now introduce the virtual difficulty of question C for measure Q with respect to mea- 
sure P: 

Gp{n,C,Q) = Q{Q\P) + G{n,C,Q). (18) 

In particular, Gp{Q,C, P) = G{Q,C, P), i.e. the virtual difficulty of C for measure P relative to 
P reduces just to the standard difficulty of C. 

It also turns out to be useful to introduce the relative expected loss for partitions of Q and 
measures Q obtained from the original measure P by a (simple) scenario reduction operation. 
In other words, we assume that there exists i' G 'JIhj{Qn) for some value of M < such that 
Q = i^{P). The relative (to measure P) expected loss of partition C and measure Q is then defined 
as follows. 

Lp(C, Q)=Y1 P{C)L{gc,Q,P), (19) 
cgc 



where gc,Q is the subset-optimal map for partition C and measure 



Q. In particular, if C is 



the trivial partition C = {il}, the loss of Q relative to P is simply[43!] Lp{Q) = L{gQ,P). If 
the measure Q coincides with P, the loss relative to P is just the standard expected loss of the 
corresponding subset-optimal map: Lp(C,P) = L{gc,p, P)- 

Let us now consider the following construction. Reduce the original measure P to Q that is 
supported at r points: Q = i^{P), where u S ^r{^N)- Let Q = J2j=i Qj^j preimage 
of ojj under map z^: iy{uji) = ojj for all G Sj. Then let C be a partition of such that Sj C Cj 
for j = 1, . . . , r. We say that the partition C is generated by the map v S 3ir(f^Af); or, equivalently 
by the reduction of measure P to Q. Let C be an arbitrary coarsening of C. 
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We are interested in the location of points P, Q, (C, P), (C, Q), (C, P) and (C, Q) on the plane 
with coordinates {Gp{VL, •), Lp(-)). First of all, it is clear that Gp{VL, P) = and Lp{P) = L{gp,P) 
where L{gp, P) is the EVPI of problem ([1]). Second, it is also clear that 

Gpin,C,Q) = Q{Q\P) + G{n,C,Q) 

= G{n,Cf{P),P) - G{n,Cf{Q),Q) + G{n,C,Q) (20) 

= G{n,Cf{p),p) 

since C = Cf{Q) by construction of Q. In words, the virtual difficulty of the question C for 
measure Q where the partition C was generated by a reduction of the original measure P to Q is 
equal to the difficulty of the exhaustive question for the original measure P. 

To obtain relationships between relative expected losses the following two auxiliary lemmas are 
needed. 

Lemma 1 Let Cij = Cji, i,j = 1,...,N be a symmetric matrix with elements Cij satisfying the 
triangle inequality Cij < Cik + Ckj. Let {pi}iLi be a probability distribution. Then 

N N N 

1=1 i=i j=i 

Proof: See Appendix A. □ 

The second lemma states a useful probability metrics result. Let P = J2f=iPi^u)^ be a discrete 
support probability measure on Q and let Q = J2i=i ^li^cbi be another such measure. Let Qc{P-, Q) be 
a Fortet-Mourier metric for some cost function c : x 17 ^ that satisfies conditions described 
in Appendix B. Finally, let C = {Ci, . . . , Gr} be a partition of Q such that the measures P and Q 
are C-equivalent. 

Lemma 2 Under assumptions described above, 

1- Cc{P,Q) < E'j=iwMPc,,Qc,), where Wj = P{Gj) = Q{Gj). 

2. If Q is generated by some map v € 'Jiri^N) that is c-optimal, where c is the reduced cost 
function defined as in iB8\) then 

Cc{P,Q) = E"j=iWjUPc,,Qc,). 

Proof: See Appendix A. □ 

Now, assume that the integrand f{uj,x) in ([1]) is in class 3'c defined in Appendix B for some 
symmetric cost function c : x $7 — )• that satisfies the conditions described in Appendix B. The 
following proposition describes a relation between relative expected losses for measures P and Q. 
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Proposition 3 Let C be a partition of Q generated by a reduction of a measure P with support at 
C ft to Q by means of a c-optimal map v € 3?r.(^^Af) Ob^d let C any coarsening of C (including 
C itself). Then 

Lp{C, Q) < Lp(C, P) + 2KUP. Q), 
where K > is some constant that does not depend on measures Q and P. 
Proof: See Appendix A. □ 

If we use the trivial partition C = {0,} (which is obviously a coarsening of any C) in Proposi- 
tion[3]we can obtain an upper bound on the relative loss of Q with respect to P which we formulate 
as a corollary. 

Corollary 1 The loss of reduced measure Q relative to P can be bounded from above as 

Lp{Q)<L{gp,P) + 2KCciP,Q), 

where L(gp, P) = Lp(P) is the EVPI of the original problem (Op. 

The following proposition relates the expected loss of a subset-optimal map based on a partition 
generated by a reduction of the original measure P to measure Q to the Fortet-Mourier distance 
between P and Q. 

Proposition 4 Let C be a partition of $7 generated by a reduction of a measure P supported at 
the discrete set C 0, to measure Q by means of a c-optimal map v G 'J^iVlfq'). Then 

Lp{C,P) = L{gc,P,P) < 2KCc{P,Q), 

where K > is a constant. 
Proof: See Appendix A. □ 

Fig. [2]shows the locations of various points on (Gp(0, •, ■),Lp{-, •)) coordinate plane. 
Several useful observations can now be made. 

• The result of Proposition H] suggests that good (near-optimal) partitions of can be gen- 
erated by a reduction of the original measure P to a measure Q that is (i) supported at a 
few points and (ii) has a low value of the Fortet-Mourier metric Cc{P-,Q) = /*c(-P;Q)- The 
latter value of the Monge-Kantorovich functional [ic{P,Q) with the reduced cost c can be 
computed as that of a minimum-cost transportation problem. 




/i/ Gp(n,-,-) 

FIG. 2. Pseudoenergy (including virtual pseudoenergy) vs. relative loss. 



• For a wide class of linear multi-period two stage stochastic optimization problems, the rele- 
vant cost function c is given by Cp (see Appendix B) with p = I + 1 where / is the number 
of periods. The corresponding minimum cost transportation problem can easily be solved 
exactly for fixed support of measure Q and approximately if the support itself needs to be 
optimized (see Appendix C for details). 

• The optimality "price" one pays for scenario reduction from the original measure P to 
a simpler measure Q - which can be thought of as adding information that's minimally 
relevant to the problem in question without actually finding it - can be estimated by the 
amount 2Kjli.{P,Q). This implies, in particular, that one could do a scenario reduction 
before starting the search for the efficient frontier. In fact, scenario reduction and additional 
information acquisition are complementary to each other in the sense of information: scenario 
reduction, as already mentioned, can be thought of as an addition of information that's 
minimally relevant as opposed to information acquisition optimization, where one looks for 
maximally relevant information. 

It is now possible to formulate an efficient approximate algorithm for optimal partition determi- 
nation. 

B. Efficient frontier algorithm 

Proposition H] provides a useful tool for approximating the efficient frontier. Specifically, one 
can use the following algorithm (here and later we assume that the original measure P onVt has a 
support at a discrete set Oat C O consisting of points). 
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1. Choose an integer parameter > 2. 

2. Choose an appropriate cost function c: il. x 0, such that f{uj,x) G 3"c for all x € X. 
Let c be the corresponding reduced cost function. 

3. Reduce the original measure P to measure Q supported at r points in the set Qn, i-e. find 
a c-optimal map i' E IRr(f^Af) such that Q = v{P)- 

4. Let C be any partition of 17 generated by the map v. 

5. Let the map 5c, p G C be a subset-optimal map corresponding to partition C. 

Varying the value of parameter r from 2 upwards one can obtain a series of maps in the set C that 
are (approximately) Pareto-optimal. Step 2 of the above algorithm is essential for its feasibility. 
For example, if the problem ([T]) is a linear multi-period stochastic optimization problem, the cost 
function of the form ()B12|1 can be used. In step 3, finding the measure Q supported at r points 



that minimizes the value of Monge-Kantorovich functional fic{P,Q) is an NP-hard problem [42 1 
but approximate algorithm such as fast forward selection algorithm are available (see Appendix 
C). 

Using the algorithm described above, one can obtain one approximately Pareto-optimal map for 
each value of the chosen integer parameter. If more Pareto-optimal maps are needed (especially in 
the region with lower values of pseudoenergy) additional heuristics can be used. For instance, one 
could begin with the algorithm described above for some relatively high value of r and then merge 
some of the resulting subsets into one giving rise to a partition with a lower value of r. Clearly, 
this can be done in Br — 1 ways, where Bn is the n-th Bell number which is just the number of 
all different partitions of a set consisting of n elements and that can be found from the recursive 
relation Bn+i = "^2=0 (fc)-^^ Bq = 1. (For example, the Bell number for the lower values of n 
are B2 = 2, S3 = 5, B4 = 15, B5 = 52, Bq = 203, Bj = 877, Bs = 4140.) 

We see that if the original chosen value of r is not very high this would lead to a manageable 
number of partitions. Additionally, scenario reduction can be used to reduce computational com- 
plexity of finding the values x*p^ for subsets C of resulting partitions. On the other hand, if the 
original value of r makes evaluation of all maps that can be obtained this way computationally 
prohibitive, a heuristic algorithm described by the following pseudo-code can be used. It finds 
another partition, with a lower value of r, so that the subset merging procedure can be applied. 

The goal of the algorithm represented by the pseudo-code is to identify subsets which are locally 
compact but as far away from one another as possible. In each step k, we find the average distance 
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Algorithm 1: Approxiiualicni to Parclo-oplimal l)oundar\'. 

Input:; 

C = {Ci, . . . , Cr},; 
{u)i,...,ujr\(^i e Ci} C O,; 
choose an integer n such that 1 < n < r — 2.; 
Step 0:; 

J[°l:={l,...r},; 

a — {Ci,..., C;+J such that C'i := 0, Vi,; 
calculate Cp{iOi,iOj), € jI^L; 
Step = 1, . . . , n:; 
foreach i e J['^~^l do 
Cp(j) := Hjfijik-i] Cp{uJi,u>j),; 

end 

Ufe := argmaxi£j[fe-i] Cp(i),; 



Cp of each subset center remaining in the index set to only the other remaining centers. 

The center, and therefore the associated subset, with the largest average distance is chosen and 
removed from the set jl'^"^!. The remaining subsets are then merged into a single set. 

So far the pseudotemperature function u has not been taken into account. It is clear, on the other 
hand, that it will in general affect the composition of the set of Pareto-optimal maps. In order to 
properly incorporate the pseudotemperature function into the heuristics described above, one could 
note that the questions difficultly is generally smaller when subsets with high pseudotemperature 
values have large measures as well. In other words, if one wishes to keep the question difficulty low, 
one should avoid creating subsets of small measure in regions of the parameter space characterized 
with high pseudotemperature values. To facilitate creation of such subsets, one could, for example 
modify the (reduced) cost function c in the following way 

Jc(u{uJi),u(u)j) 

where fc- ^+ x ^+ ^+ is some increasing function of its arguments. The specific shape of fc can 
be determined experimentally, and several shapes can be tried for every given instance assuming 
computational resources are not a limiting factor. 



18 



V. EXAMPLE 

Let us consider an example. The original problem is a that of two-stage linear stochastic 
optimization with simple recourse taken from a well-known textbook 4j]. The problem is for a 



farmer to allocate the appropriate amount of land between wheat, corn and sugar beets in order 
to maximize profits. The farmer knows that at least 200 tons of wheat and 240 tons of corn must 
be grown for cattle feed. If not enough is grown to satisfy this demand, both wheat and corn can 
be bought for $238 and $210 per ton, respectively. Any excess above the demand can be sold for 
$170 and $150 per ton of wheat and corn, respectively. It costs $150 per acre to plant the wheat 
and $230 per acre to plant the corn. The farmer can also grow sugar beets that sell for $36 per 
ton. However, there is a quota of 6000 tons and any amount grown above this may only be sold at 
$10 per ton. It costs $260 per acre to plant sugar beets. The farmer has 500 acres available. 
The problem can be stated as: 

minimize 150xi 230x2 260x3 EpQ(x, J]) (FP) 
subject to xi -|- X2 -|- X3 < 500 

Xl,X2,X2, > 0, 

where the second stage problem for a specific scenario can be written 

Q{x, s) = minimize{238yi — ITOwi -|- 210^2 — 150u;2 — 36w3 — lOw^} 
subject to uji{s)xi + yi + wi > 200 

'^2('S)X2 + y2 + 'W^2 > 240 

W3 + Wi <a;3(s)x3 
^3 < 6000 
yi,y2,wi,W2,W3,W4 > 

where uji{s) represents the yield of crop i := 1,2,3 for wheat, corn, and sugar beets, respectively, 
under scenario s; Xj are the acres of land to devote to each crop i; yi,y2, are tons of wheat and 
corn, respectively, purchased to meet cattle feed requirements; toi , 102 , 1^3 , are tons of wheat, 
corn, sugar beets below quota, and sugar beets above quota, respectively, sold for profit. 

The problem has been modified in order to create the illustrative example used below. In this 
example, only wheat and sugar beet yields are uncertain. Each is allowed to take five different 
values of yields resulting in 25 scenarios. For the sake of convenience, we assume that the corn yield 
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FIG. 3. Pseudotemperature function given for the farmer land allocation problem with uncertainty residing 
in the yields of wheat and sugar beets. 

is non-random and is equal to 3 tons per acre, while for both wheat and beets the average yield 
equal to 2.5 and 20, respectively, has a probability of 0.30. The yield for both of these cultures 
can be either higher or lower than average by 20% with probability 0.20 and also can be higher or 
lower than average by 30% with probability 0.15. The yields for wheat and beets are assumed to 
be independent. 

The resulting uncertain yields are summarized below: 



where i, j are the indices referencing the uncertain yields of wheat and sugar beets, respectively 
(where the smallest value of the uncertain yield corresponds to i = 1 (j = 1) and the largest 
yield corresponds to i = 5 (j = 5)). The pseudotemperature function is then normalized so that 
Kpu{ij) = 1. Fig. [3] shows a plot of the pseudotemperature function. 

The efficient frontier can be approximated by using the scenario reduction based algorithm 
described in the previous section together with subset merging heuristics. The resulting maps are 
shown in Fig. |4] for the case of constant pseudotemperature. The resulting approximate efficient 
frontier both for constant pseudotemperature function and for the pseudotemperature given shown 




u{i,j) = i ■ f'^,^ i,j 



G 1,...,5 



(22) 
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FIG. 4. Maps that are generated by scenario reduction for various values of r (solid dots), scenario reduction 
for r = 5 with subsequent subset merging (crosses), scenario reduction to r = 10, reducing to r = 5 using 
the pseudo-code and subsequent subset merging (circles). Pseudotemperature function is set to a constant. 

9000 
8000 
7000 
6000 
5000 

o 

4000 
3000 
2000 
1000 


0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 

Pseudo-energy 

FIG. 5. Approximate efficient frontiers for the constant pseudotemperature function (solid line) and pseu- 
dotemperature shown in Fig. [3] 




in Fig. [3] are shown in Fig. O 

Now consider an information source described by the modified linear model with parameters 
b = 0.8 and Ys = 0.2 (which is a rather modest capacity value). We would like to find out how much 
the original loss can be reduced by optimally using such an information source. In other words, we 
want to solve problem (jl4p . For this purpose one can take questions on the (approximate) efficient 
frontier and plot parametric curves 



21 



9000 
8500 
8000 
7500 
S 7000 
6500 
6000 
5500 
5000 

0.2 0.4 0.6 0.8 1 1.2 

Pseudo-energy 

FIG. 6. Part of approximate efficient frontier and parametric loss curves for quasi-perfect answers to three 
different questions for the case of constant pseudotemperature. 

{Y{n,C,P,Va{C)),L{Va{C))) where £(T4(C)) is given by Proposition [2 The question yielding 
the lowest point of intersection of such a curve with the vertical line G = Ys will give an approximate 
solution of problem (I14p . 

Results for the case of constant pseudotemperature are shown in Fig. [6l The parametric curves 
for three questions (all three with r = 2) are produced. We can see that the lowest value of the 
expected loss that can be obtained this way is equal to 7250 which constitutes a reduction of about 
14%. 

For the case of non-constant pseudotemperature are shown in Fig. [71 Analogously, three r = 2 
questions were chosen on the approximate efficient frontier and the corresponding parametric curves 
plotted. The best curve is observed to intersect the vertical line G = 0.2 at the value of vertical 
coordinate equal to about 6900 which represents a reduction of about 18% compared to the EVPI 
of 8450 of the original problem. 

VI. CONCLUSION 

The main subject of this work is the development of approximate methods for solving the prob- 
lem of optimizing additional information acquisition in decision making problems with uncertainty 
that are typically solved using stochastic optimization techniques. It represents a logical contin- 
.a.1o„ ..e deve>op.e„. . «. The p.„«e. fo....ed U . ... 

of finding the efficient frontier in the space of questions to the given information source and de- 
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FIG. 7. Part of approximate efficient frontier and parametric loss curves for quasi-perfect answers to three 
different questions for the case of non-constant pseudotemperature shown in Fig. |31 

termining optimal question(s) that allow for the loss reduction maximization for the problem the 
agent is interested in solving. 

The solution methods proposed in here are based on the method of probability metrics and 
their application for scenario reduction in stochastic optimization (see appendices). The main 
idea is that, informally speaking, optimal scenario reduction on one hand and optimal information 
acquisition on the other hand are complementary processes. More specifically, in scenario reduction, 
the goal is to reproduce the overall shape of the original probability distribution as faithfully as 
possible with a probability measure of a smaller support - one strives to keep the "overall shape" 
of the distribution while leaving out the "small details" . From the informational point of view, this 
corresponds to searching for the least relevant information and adding it (updating the measure 
accordingly) - without finding it. In information acquisition, on the contrary, the goal is to find the 
most relevant information that, at the same time, is relatively easy (so that the question requesting 
it has sufficiently low difficulty) for the information source to supply - and, hence, it ends up being 
accurate. Therefore, if one has a method for finding the least relevant information, the same 
method can likely be made to work for finding the most relevant information. 

This observation provides for a means of development of simple approximate algorithms for de- 
termining the efficient frontier and for finding optimal questions for the given information source. 
The methods described here work for the class of linear multi-period two stage stochastic optimiza- 
tion problems and should generalize relatively easily to other problem classes for which scenario 
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reduction based on probability metrics was shown to be possible, including chance constrained and 
two-stage integer stochastic optimization problems. 

Appendix A: Proofs 
1. Proof of Lemma [T] 

Let i* = argmiuj ^^^pjCjj (so that mini^^^^pjCij = ^^=iPjCi*j)- Then we can write 



N N (a) ^ ^ 

^^PiPjCij < ^^PiPjiCii* +Cr 
i=l j=l i=l j=l 

N N N N 

i=l j=l i=l j=l 

N N N N 

j=l i=l i=l i=l 



N 



2min^PjCij, 



where (a) follows from the triangle inequality satisfied by the elements Cij and (b) follows from the 
definition of i*. 



2. Proof of Lemma [2l 

The first statement actually holds true for any measures P,Q(z J'c(ri) (see Appendix 1 for the 
definition of ^£(1^)). Indeed, let f*{u)) € 3"c be the function that achieves the maximum of 



f{oj)P{doj)- / f{uj)Q{doj) 
n Jn 

Let fj{^^) be the restriction of f*{uj) to Cj. Clearly, /*(i^) € Jc{Cj). We can write 
Cc{P,Q) = 



noo)P{duj) - / f{u)Q{du) 
n Jn 



(a) 



(b) 



r 


1 r{u)dPc^{u)- j 


r 


/ f*{u)dPc^{u)- f 
J Cj J Cj 


^WjCc{Pc,,Qcj), 



r{u,)dQcM 

f*{Lo)dQc^{u) 
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where (a) follows from the definition of conditional measures Pc^ and Qcj , (b) follows from the def- 
inition of functions fj{^), and (c) follows from that fjioj) € 3"c(Cj) and definition of C,c{Pcj > QCj)- 



To prove the second statement, we can use the duality result (IBSh together with (IBlOh that 
relates the values of Kantorovich- Rubinstein and Monge-Kantorovich functionals. Let v £ Jlri^N) 
be the map that generates partition C, and let Coj = I'^uJi) for all Ui € Cj. Note also that 
Qj = J2{i:uj,eC,}Pi = Wj,j = l,...,r. We can write 



r r r 

^Wj(c{Pc,,Qc,) = ^Wjflc{PCj,Qc,) =^'Wj ^c{uJ^,Ljj) 

j=i i=i i=i {i:ujieCj} ^ 

r 

= Y ^ PiC{uJi,UJj) = flciP,Q) = CciP,Q), 

j = l {i:uJieCj} 



where (a) and (d) follow from (jBSp and (jBlOp . (b) follows from that Qc^ is supported at a single 
point ojj, (c) follows from the way measure Q was constructed as a reduction of the measure P 
with a c-optimal 
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3. Proof of Proposition [3) 



Let Wj = P{Cj) = Q{Cj) be the measure of subsets in C and let Pj = and Qj = Q^, be 
the corresponding subset measures. 



f{u,x*Q^)P,{du)- I f{uj,x*jP,{du) 
Cj JCj 



f{u:,x*Q)P,{du:)- I f{u;,x:)P,{duj) 



+ / f{uj,X*p^)Pj{du)- I f{LO,X*p^)P,{du) 



(a) 



Lp{C,P) + Y^ 



Wn 



r 



f{u,x*Q^)Pj{du)- f{u;,x*p^)Pj{du) 



Ci 



Lp{C,P) + Y,wj 



fiu;,x*Q^)P,{du;)- 1^ f{u,x*p^)P,{dw) 
+ I f{^,x*Qj)Qjid^)- I f{^,x*Q)Qj{duj) 



Lp{C,P) + Y, 



r 



viQj) - v{P,) + I fioj,x*Q^)iPj - Qj){doj) 



< Lp{C,P) + J2wj \v{Qj) - v{Pj)\ + 

i=i i=i 

(c) 

< Lp{C, P) + KY,WjCciPj,Qj) + kY, WjCc{Pj,Qj) 

r 

= Lp{C, P) + 2KY, WjCc{Pj,Qj) 
'^^ Lp{C,P) + 2KCc{P,Q) 



f{^^x*QMPj - Qj){duj] 



where (a) follows from the definition of Lp(C,P), (b) follows from the definition of the optimal 
objective values v{Pj) and v{Qj), (c) follows from that the integrand f{uj,x) is in 3~c and definition 
()B4p of Fortet-Mourier metric Co and (d) follows from Lemma [2j 
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4. Proof of Proposition [4) 

Let Wj = P{Cj) = P{Qj), j = 1, . . . ,r be measures of subsets in C and let Pj and Qj be the 
corresponding subset measures. 

r r 

L{gc,P,P) = Y.WjL{gp^,P,) = ^w, / (/(l., x^^) - /(a;, <)) P,(d.;) 

.7 = 1 .7 = 1 '^^ 



j=i {i-.LUieCj} ■' 

r 



j=l \ {i-.UJi^Cj} 

r 

= E^i E {PJ)^iv{PJ)-v{S.,)) 
j=l {v.uiiGCj} 

<kJ2'^j E {PjhUPj^S^J^^K^w, {PjhMPj,Su.J 

j = l {i-.LJ.eCj} j = l {i:cj,GCj} 

r 

= iC^Wj ^ Yj {Pj)kc{uJi,ujk) 

j = l {i:uJi€Cj} {fciWfceCj} 

(d) ^ ^ 

< min ^ (Pj)ic(w„LJfc) = 2i^^ min 

J — 1 {iiWiGCj} J — 1 

r r 

2KY,wMPj,Qj) = 2KYyJjUPj,Qj) = 2KUP,Q), 
j=i j=i 

where {Pj)i = ^ for ooi G Cj, (a) follows from the definition of optimal values v{Pj) and v{5^^^), (b) 

follows from the upper bound (jBlip . (c) follows from the duality relation (jBSP and from the relation 

(jBlOp between the Kantorovich-Rubinstein and Monge-Kantorovich functionals, (d) follows from 

Lemma [T] (since c is a metric and {{Pj)i}{i:u}i^Cj} ^ probability distribution), (e) follows from 

that Q = v{P), where v is c-optimal, and (f) follows from Lemma [2jn 

Appendix B: Probability metrics and stability in stochastic optimization 

Consider the problem ([T|. Let 7[Vt) be the set of all Borel probability measures on and define 

v{P)=mf!^j f{uj,x)dP{uj) 



and 



S{P) = !^xeX: J f{uj,x)dP{uj) =v{P)^ 
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to be the optimal value and optimal solution set of ([T|), respectively. 
Let's also define (as in, for example, 45l |) 



J={/(-,x) : xGX} 



and 



TgrfQ) = \ q £7: -oo < inf f(uj,x)Q(du) and 
sup / f{uj,x)Q{duj) < oo, for all/? > > , 



where B is the closed unit ball in M". 

Then the probability distance of the form 



dyAP^Q)= sup 



f{oj,x)P{doj)- / f{uj,x)Q{doj] 



(Bl) 



can be defined on !Pg-(ri). This distance is called Zolotarev's pseudometric with (^-structure 46l449l|. 
The pseudometric ()Bip would become a metric if the class 3" were rich enough so that dj^p(P, Q) = 
implies P = Q. 



Theorem 2 in [50] states that if P, Q G Vy, S{P) is nonempty and bounded then there exist 
p > and 5 > such that 



\v{P)-v{Q)\<dy,p{P,Q) 



(B2) 



is valid for all Q € IPj- such that dy^p{P,Q) < 5. 

The distance dy^p in ()B2p is typically difficult to handle since the class of functions 3~ is de- 
termined by the specific integrand f{uj,x) for the given instance of problem ([1]). The main idea 
underlying the use of the probability metrics method for the study of stability and for scenario 
reduction in stochastic programming is to suitably enlarge the class 3~ so that it still shares its main 
analytical properties with functions f{-,x). Such properly enlarged classes are sometimes referred 
to as canonical classes and the corresponding metrics are sometimes called canonical metrics. 

Consider, for instance the class 3'r of continuous functions defined as 



3'c = {f : n^R: \f{uj)-f{6j)\ <c{u;,ui), for ah w, w € 0} , 



(B3) 



where c: $7 x — )• is a continuous symmetric function such that c(a;,a)) = if and only if 
UJ = oj. Then the corresponding (pseudo-) metric has the form 



(:c{P,Q) = dj^{P,Q) = sup 



f{u;)P{du;) - / fiu;)Q{doj] 
n Jn 



(B4) 
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and is known as Fortet-Mourier metric. If the cost function c{uj, lD) satisfies additional boundedness 
and continuity conditions: 

• c{uj,u!) < X{u}) + A(tD) for some A: 17 — t- M+ mapping bounded sets into bounded sets, 

• sup{c{uj,uj) : WjU; € ]Be(u;o), —u}\\ < 5} — )■ as (5 — )■ for each loq G 0, where ©^(cjo) is 
the e-baU centered at uq, 
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the Fortet-Mourier metric ()B4p admits a dual representation as the Kantorovich- Rubinstein func- 
tional 



TTiT] - 712^] = P - Q} 



c{uj,uj)r]{dijj, deb) : r] G x il), 



where vri and tt2 denote projections on first and second components, respectively. It is straight- 
forward to show that the Kantorovich-Rubinstein functional (jBSP can be upper-bounded by the 
Monge- Kantorovich functional: 



kiP,Q)<MP,Q)=mf!^l^ 

TTiT] = P, ^27] = Q} 



c{uj , uj)ri{doj , dob) : r] G !P(il x Q), 
nxn (B6) 



and that the bounds becomes tight, (i.e. li^{P,Q) = fic{P,Q)) if the cost function c{!jJ^uj) is a 

metric on Q, [52]. The problem of finding the minimum in ()B6P is known the Monge- Kantorovich 

mass transportation problem. 

Note that if measures P and Q are discrete {P = Y^iLiPi^un arid Q = YlfLi Qi^uj), the Monge- 

Kantorovich functional ()B6P takes the following form: 

' N M N M 

^ ^ c{uji,Uj)r]ij : r]ij > 0, ^ r]ij = Qj,^ = pi ViJ 



p,c{P, Q) = niin < 



(B7) 



i=l j=l i=l j=l 

{N M 
'^PiUi + '^Qj'Vj ■ Ui + Vj < c{uJi,ujj) 
i=i j=i 

Given the cost function c{uj,Cj) one can define the reduced cost c{uj,Cj) on x J] by 

c(ci;,u)) = inf < c(a;i,a;i+i) : m G N, G fi, wi = w, ujm = oj \ . (B8) 

In can easily be shown that the reduced cost function c{uj,Gj) is a metric (since it satisfies the 
triangle inequality) on and that c{oj,Cj) < c{uj,u}) with the inequality being tight when c{uj,uj) 
is also a metric. 
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It can also be shown (see [48(], chapter 4) that if Q is compact with analytic sublevel sets then 
the Kantorovich-Rubinstein functional (jBSh with the reduced cost function c coincides with the 
Kantorovich-Rubinstein functional with the original cost function c (the result referred to as the 
reduction theorem): 

li,{P,Q) = °^,{P,Q). (B9) 

Since the reduced cost is a metric on J7 we have //^(-P, Q) = ficiP, Q) and, comparing with ()B9P we 
conclude that, for compact parameter spaces with analytic sublevel sets, the equality 

^,(P, Q) = [i,{P, Q) < ficiP, Q) (BIO) 

holds true. 

We thus arrive at the following useful stability result. If the integrand in problem ([T|) belongs 
to class 9"c for all x G X for some cost function c satisfying additional boundedness and continuity 
conditions described earlier in the appendix, then the estimate 

\v{P) - vm < UP, Q) = kiP, Q) = h{P, Q) (Bii) 

is valid for Borel measures P and Q in J'c(^) on compact characterized with analytic sublevel 
sets. (Here TciS^) = {Q G J'(f^) : f^c{LJ,LJo)dQ{(jj) < 00} for some ujq G $7.) 

The particular function c{u},ijj) that plays an important role in the context of convex stochastic 
optimization has the form 

Cp{uj,Cj) = max{l, Ijo; — woir^"*"; H'^ — '^olT^"'"}!!'^ — '^lli (B12) 

for some ojq G Q. The corresponding metric Cp = Ccp is referred to as the p-th order Fortet-Mourier 
metric. 

To give an example of a class of problems for which the p-th order Fortet-Mourier metric is 

relevant, consider linear multi-period stochastic optimization problems of the form 

I 

min < cyo + Epfmin Cj(w)yj ) :yoeX, yj G Yj, 

^ U ' (B13) 

^jjVj = ^i(^) - Wjj-i{uj)yj^i, j = l,...,l}, 
where Yj C are polyhedral sets. Problem ()B13P can be written in the form ([1]) with the 
integrand f{uj,x) given by 

f ' 

f{uj, x) = CX + inf < ^ Cj{uj)yj : yj G Yj, Wjjyj = bj{uj) - Wjj-i{uj)yj^i, 
j = 1, ...,/} = cx + ^'1(^^,2;), 



30 



where the function ^i{uj,x) is defined recursively: 

<^j{uj,Uj^i) = ini {cj{uj)yj + ^j+i(u;,yj) : yj G Yj, WjjVj = Uj-i} 

for j = /,..., 1 and ^^+1(0;,^;) = 0. 

It is shown in [45] that if bj{uj) — Wjj-^i{u})x E WjjYj for all pairs {uj,x) (relatively complete 



recourse) and ker(VFjj) fl Y?° = {0} for j = 1, — 1 (where Y?° denotes the horizon cone 53| of 



Yj) then there exists a constant K such that 



\f{u>,x) - f{u,x)\ < Kma.x{l,p, \\uj\\ , }||a; - (B14) 

for all w, w G and x £ X H pE. This implies that — ^- — 7 f (w, x) E 3"c,j_i for all co, u £ and 
xeXnpM. 



It is now straightforward to obtain the following result ( 45|]). Let v{P) be the optimal value of 
problem (jBlSp . Assume that the relatively complete recourse condition for (IBlSp is satisfied and 
that keic{Wjj) H Y?^ = {0} for j = 1, — 1. Then there exists a constant K > such that the 
estimate 

\v{P)-v{Q)\<KCi+i{P,Q) (B15) 

is valid for any P, Q £ 'J'i^i{Q). (Here J'/+i(r2) denotes the set of Borel measures on Q with finite 
(/ + l)-th order moments.) 

Specifying the general result (IBlip to the cost function of the form ()B12p with p = l + l we can 
rewrite the estimate (jBlSP for the difference in optimal objective values of problem (|B13p as 

\v{P)-v{Q)\ < Kpi_,,{P,Q) = Kfi,^^^{P,Q), (B16) 

where -ftT > is some constant. 



Appendix C: Scenario reduction algorithms 

The goal of scenario reduction algorithms is, given a stochastic optimization problem of the form 
characterized by a discrete measure P = "^{LiPi^uji find the discrete measure Q = J2j=i Qi^cbj 

such that M < N and the difference in the optimal objective values \v{P) — v{Q)\ is as small as 

possible. 
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If the stochastic optimization problem has the form (jBlSp of a hnear multi-period problem then, 
as discussed earlier in this section, under relatively complete recourse assumption, the upper bound 
()B16p can be shown to hold. This motivates searching for discrete measures Q that minimize the 
distance fj-i+i{P,Q) (or ili^i{P,Q)). 

Thus the optimal scenario reduction problem based on the method of probability metrics can 
be formulated as follows [50]. Let J C {1,2,... , A^} and consider the measure Q = J2j!^.j Qj^i^j 
supported at points ujj, j € {1,2,..., N} \ J. The measure Q is said to be reduced from P by 
deleting scenarios coj, j & J and by assi gni ng new probabilities qj to the remaining scenarios. The 



optimal reduction concept proposed in 



50] seeks the minimum value of the functional 



N 



D{J;q) = jip ^Pi5uj,,^qj5uj, ■ (CI) 
\i=i iiJ J 

It is shown in [50|] that, for set J fixed, the optimal weights q are straightforward to find: 

I3 = Pj + '^Pi^ for each j ^ J, (C2) 

where Jj := {i & J : j = and € arg min^^j Cp(Li;j, cj^) for each i £ J. The corresponding 

minimum of the functional D{J; q) is 

Dj = min{D{J;q) : qj > 0, V'gj = 1} = V'piminCp(wi,a;j). 

On the other hand, the optimal choice of the set J of given cardinality \J\ = k 
min{L>j = > pi min Cp{u}i,ujj) : J C {1, 2, . . . , N}, \ J\ = k} 

is a combinatorial problem, and it is unlikely that efficient solution algorithms for arbitrary value 
of k are available. However cases k = 1 and A; = — 1 are easy to solve to optimality and they can 
be used to formulate heuristic algorithms for other values of k. The fast forward scenario reduction 
algorithms proposed in 



421 ] proceeds as follows. 



Fast forward selection algorithm: 

Step 1: c[2 := Cp(wfc,cj„), k,u = l,...,N, 

■■=Y.Pkcilu = l,...N, 
k=i 

ui G arg min z^, := {1, . . . , iV} \ {ui}. 

ue{i,...,N} 

Step i: 4t := min{4-^l, cL"l}> k,u e Jl-H, 
fceJ['-il\{M} 
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Ui e arg mill zl\ := jl^-^l \{ui]. 
Step n + 1: Redistribution by ([C2l) . 
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